253 research outputs found

    Visualisation of semantic enrichment

    Get PDF
    Automatically creating semantic enrichments for text may lead to annotations that allow for excellent recall but poor precision. Manual enrichment is potentially more targeted, leading to greater precision. We aim to support nonexperts in manually enriching texts with semantic annotations. Neither the visualisation of semantic enrichment nor the process of manually enriching texts has been evaluated before. This paper presents the results of our user study on visualisation of text enrichment during the annotation process. We performed extensive analysis of work related to the visualisation of semantic annotations. In a prototype implementation, we then explored two layout alternatives for visualising semantic annotations and their linkage to the text atoms. Here we summarise and discuss our results and their design implications for tools creating semantic annotations

    Text categorization and similarity analysis: similarity measure, literature review

    Get PDF
    Document classification and provenance has become an important area of computer science as the amount of digital information is growing significantly. Organisations are storing documents on computers rather than in paper form. Software is now required that will show the similarities between documents (i.e. document classification) and to point out duplicates and possibly the history of each document (i.e. provenance). Poor organisation is common and leads to situations like above. There exists a number of software solutions in this area designed to make document organisation as simple as possible. I'm doing my project with Pingar who are a company based in Auckland who aim to help organise the growing amount of unstructured digital data. This reports analyses the existing literature in this area with the aim to determine what already exists and how my project will be different from existing solutions

    Text categorization and similarity analysis: similarity measure, architecture and design

    Get PDF
    This research looks at the most appropriate similarity measure to use for a document classification problem. The goal is to find a method that is accurate in finding both semantically and version related documents. A necessary requirement is that the method is efficient in its speed and disk usage. Simhash is found to be the measure best suited to the application and it can be combined with other software to increase the accuracy. Pingar have provided an API that will extract the entities from a document and create a taxonomy displaying the relationships and this extra information can be used to accurately classify input documents. Two algorithms are designed incorporating the Pingar API and then finally an efficient comparison algorithm is introduced to cut down the comparisons required

    The new (liberal) eugenics

    Get PDF
    Despite the Nazi horrors, in 1953 the new eugenics was founded, when Watson and Crick postulated the double helix of DNA as the basis of chemical heredity. In 1961, scientists have deciphered the genetic code of DNA, laying the groundwork for code manipulation and the potential building of new life forms. After thirty years from the discovery of the DNA structure, the experimenters began to carry out the first clinical studies of human somatic cell therapy. The practice of prenatal genetic tests identifies genes or unwanted genetic markers. Parents can choose to continue pregnancy or give up the fetus. Once the preimplantation genetic diagnosis occurs, potential parents can choose to use in vitro fertilization and then test early embryonic cells to identify embryos with genes they prefer or avoid. Because of concerns about eugenics, genetic counseling is based on a "non-directive" policy to ensure respect for reproductive autonomy. The argument for this counseling service is that we should balance parental autonomy with child's autonomy in the future. Specialists have not yet given a clear answer to the question of whether these practices should be considered eugenic practices, or if they are moral practices. DOI: 10.13140/RG.2.2.28777.9584

    Text categorization and similarity analysis: implementation and evaluation

    Get PDF
    This report covers the implementation of software that aims to identify document versions and se-mantically related documents. This is important due to the increasing amount of digital information. Key criteria were that the software was fast and required limited disk space. Previous research de-termined that the Simhash algorithm was the most appropriate for this application so this method was implemented. The structure of each component was well defined with the inputs and outputs constant and the result was a software system that can have interchangeable parts if required

    Special Issue on Generic Programming Editorial

    Get PDF

    Unifying Structured Recursion Schemes

    Get PDF

    Browsing and book selection in the physical library shelves

    Get PDF
    Library users should be conveniently interact with collections and be able to easily choose books of interest as they explore and browse a physical book collection. While there exists a growing body of naturalistic studies of browsing and book selection in digital collections, the corresponding literature on behaviour in the physical stacks is surprisingly sparse. We add to this literature in this paper, by conducting observations of patrons in a university library as they selected books from the shelves. Our aim is to further our understanding of patterns of behaviour in browsing and selection in physical collections

    Parberry’s pairwise sorting network revealed

    Get PDF
    Batcher’s “merge exchange” sorting network, discussed in a previous pearl (Hinze & Martin, 2016), remains one of the best practical algorithms for oblivious sorting, even almost half a century after its inception. So it is surprising that an algorithm with exactly the same level of performance, devised two decades later by Parberry (1992), has been relatively overlooked. Perhaps a reason for its lack of celebrity is that Parberry’s design is not immediately recognizable, whereas the Batcher method has a familiar ring, as a hardwired implementation of merge sort. Here we hope to rectify this imbalance by unravelling Parberry’s algorithm and uncoupling its close relationship to Batcher’s. Interestingly, Parberry derives his network using the zero-one principle (Knuth, 1998). We abandon this traditional method, in favour of a feature of comparison networks that we consider to be more fundamental: monotonicity. We shall see that this property, used before to demystify Batcher’s merger (Hinze & Martin, 2016), also helps to shed some light on Parberry’s design. To keep the pearl reasonably self-contained we start with a quick recap of the notation and Batcher’s construction
    • 

    corecore